{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 提示工程从入门到放弃\n",
    "by DAIR.AI | Elvis Saravia\n",
    "\n",
    "\n",
    "本笔记本包含了例子和练习，让你学习提示工程。\n",
    "\n",
    "我们将使用[OpenAI APIs](https://platform.openai.com/)来使用所有例子。我为所有例子使用默认设置`temperature=0.7`和`top-p=1`。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 基础提示工程\n",
    "\n",
    "目标\n",
    "- 加载库\n",
    "- 复习格式\n",
    "- 涵盖基本提示\n",
    "- 复习常见用例"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们加载必要的库、实用程序和配置。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import os\n",
    "import IPython\n",
    "from langchain.llms import OpenAI\n",
    "from dotenv import load_dotenv"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "确保用自己的密钥替换`OPENAI_KEY`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "load_dotenv()\n",
    "\n",
    "# API configuration\n",
    "openai.api_key = os.getenv(\"OPENAI_KEY\")\n",
    "\n",
    "# for LangChain\n",
    "os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"OPENAI_KEY\")\n",
    "os.environ[\"SERPAPI_API_KEY\"] = os.getenv(\"SERPAPI_API_KEY\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def set_open_params(\n",
    "    model=\"text-davinci-003\",\n",
    "    temperature=0.7,\n",
    "    max_tokens=256,\n",
    "    top_p=1,\n",
    "    frequency_penalty=0,\n",
    "    presence_penalty=0,\n",
    "):\n",
    "    \"\"\" set openai parameters\"\"\"\n",
    "\n",
    "    openai_params = {}    \n",
    "\n",
    "    openai_params['model'] = model\n",
    "    openai_params['temperature'] = temperature\n",
    "    openai_params['max_tokens'] = max_tokens\n",
    "    openai_params['top_p'] = top_p\n",
    "    openai_params['frequency_penalty'] = frequency_penalty\n",
    "    openai_params['presence_penalty'] = presence_penalty\n",
    "    return openai_params\n",
    "\n",
    "def get_completion(params, prompt):\n",
    "    \"\"\" GET completion from openai api\"\"\"\n",
    "\n",
    "    response = openai.Completion.create(\n",
    "        engine = params['model'],\n",
    "        prompt = prompt,\n",
    "        temperature = params['temperature'],\n",
    "        max_tokens = params['max_tokens'],\n",
    "        top_p = params['top_p'],\n",
    "        frequency_penalty = params['frequency_penalty'],\n",
    "        presence_penalty = params['presence_penalty'],\n",
    "    )\n",
    "    return response"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Basic prompt example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# basic example\n",
    "params = set_open_params()\n",
    "\n",
    "prompt = \"The sky is\"\n",
    "\n",
    "response = get_completion(params, prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' blue\\n\\nThe sky is blue because of the way that the atmosphere scatters the sunlight. The blue wavelengths of visible light are scattered more than other wavelengths. This is why the sky appears blue from the ground.'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response.choices[0].text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " blue\n",
       "\n",
       "The sky is blue because of the way that the atmosphere scatters the sunlight. The blue wavelengths of visible light are scattered more than other wavelengths. This is why the sky appears blue from the ground."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "尝试使用不同的温度进行比较："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " blue\n",
       "\n",
       "The sky is blue because of the way the atmosphere scatters sunlight. When sunlight passes through the atmosphere, the blue wavelengths are scattered more than the other colors, making the sky appear blue."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = set_open_params(temperature=0)\n",
    "prompt = \"The sky is\"\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 文本摘要"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "Antibiotics are medications used to treat bacterial infections by either killing the bacteria or stopping them from reproducing, but they are not effective against viral infections and can lead to antibiotic resistance if used incorrectly."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = set_open_params(temperature=0.7)\n",
    "prompt = \"\"\"Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance. \n",
    "\n",
    "Explain the above in one sentence:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "练习：指示模型用一句话来解释这段文字，就像“我五岁”一样。你看到了什么不同之处吗？"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 问答"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Mice."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"Answer the question based on the context below. Keep the answer short and concise. Respond \"Unsure about answer\" if not sure about the answer.\n",
    "\n",
    "Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.\n",
    "\n",
    "Question: What was OKT3 originally sourced from?\n",
    "\n",
    "Answer:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上下文来源: https://www.nature.com/articles/d41586-023-00400-x"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "练习：编辑提示，让模型回复它对答案不太确定。 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 文本分类"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Neutral"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"Classify the text into neutral, negative or positive.\n",
    "\n",
    "Text: I think the food was okay.\n",
    "\n",
    "Sentiment:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "练习：修改提示，指示模型为所选答案提供解释。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 角色扮演"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Sure. Black holes are regions of spacetime exhibiting gravitational acceleration so strong that nothing—not even particles and electromagnetic radiation such as light—can escape from inside it. The theory of general relativity predicts that a sufficiently compact mass can deform spacetime to form a black hole. The boundary of the region from which no escape is possible is called the event horizon. Although crossing the event horizon has enormous effect on the fate of the object crossing it, no locally detectable features appear to be observed. In many ways, a black hole acts like an idealized black body, as it reflects no light."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.\n",
    "\n",
    "Human: Hello, who are you?\n",
    "AI: Greeting! I am an AI research assistant. How can I help you today?\n",
    "Human: Can you tell me about the creation of blackholes?\n",
    "AI:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "练习：修改提示，要求模型保持AI响应简洁、简短。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 代码生成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "SELECT students.StudentId, students.StudentName \n",
       "FROM students \n",
       "INNER JOIN departments ON students.DepartmentId = departments.DepartmentId\n",
       "WHERE departments.DepartmentName = 'Computer Science';"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\\\"\\\"\\\"\\nTable departments, columns = [DepartmentId, DepartmentName]\\nTable students, columns = [DepartmentId, StudentId, StudentName]\\nCreate a MySQL query for all students in the Computer Science Department\\n\\\"\\\"\\\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 推理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "\n",
       "The odd numbers in this group are: 5, 13, 7, and 1. \n",
       "\n",
       "5 + 13 + 7 + 1 = 26 \n",
       "\n",
       "26 is an even number."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. \n",
    "\n",
    "Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even.\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "练习：改进提示，以获得更好的结构和输出格式。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 高级提示技巧\n",
    "\n",
    "目标:\n",
    "\n",
    "- 掌握更高级的提示技巧：少量提示、联想提示等"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 少量提示"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " The answer is False."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.\n",
    "A: The answer is False.\n",
    "\n",
    "The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.\n",
    "A: The answer is True.\n",
    "\n",
    "The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.\n",
    "A: The answer is True.\n",
    "\n",
    "The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.\n",
    "A: The answer is False.\n",
    "\n",
    "The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. \n",
    "A:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3  联想提示"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.\n",
    "A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.\n",
    "\n",
    "The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. \n",
    "A:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 零点提示"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "\n",
       "You initially bought 10 apples.\n",
       "\n",
       "You then gave away 4 apples, leaving you with 6 apples.\n",
       "\n",
       "You bought 5 more apples, so now you have 11 apples.\n",
       "\n",
       "After eating 1 apple, you have 10 apples remaining."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?\n",
    "\n",
    "Let's think step by step.\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 自我一致性\n",
    "练习一下，检查我们[指南](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-advanced-usage.md#self-consistency)中的示例，并在这里尝试它们。\n",
    "\n",
    "### 2.6 知识生成提示\n",
    "\n",
    "练习一下，检查我们[指南](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-advanced-usage.md#generated-knowledge-prompting)中的示例，并在这里尝试它们。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.6 PAL - 以代码作为推理"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们正在开发一个简单的应用程序，该应用程序能够通过代码对问题进行推理。 \n",
    "\n",
    "具体来说，该应用程序接受一些数据，并回答有关输入数据的问题。 提示包括一些案例，这些案例是从[这里](https://github.com/reasoning-machines/pal/blob/main/pal/prompt/penguin_prompt.py)采用的。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# lm instance\n",
    "llm = OpenAI(model_name='text-davinci-003', temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"Which is the oldest penguin?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "PENGUIN_PROMPT = '''\n",
    "\"\"\"\n",
    "Q: Here is a table where the first line is a header and each subsequent line is a penguin:\n",
    "name, age, height (cm), weight (kg) \n",
    "Louis, 7, 50, 11\n",
    "Bernard, 5, 80, 13\n",
    "Vincent, 9, 60, 11\n",
    "Gwen, 8, 70, 15\n",
    "For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. \n",
    "We now add a penguin to the table:\n",
    "James, 12, 90, 12\n",
    "How many penguins are less than 8 years old?\n",
    "\"\"\"\n",
    "# Put the penguins into a list.\n",
    "penguins = []\n",
    "penguins.append(('Louis', 7, 50, 11))\n",
    "penguins.append(('Bernard', 5, 80, 13))\n",
    "penguins.append(('Vincent', 9, 60, 11))\n",
    "penguins.append(('Gwen', 8, 70, 15))\n",
    "# Add penguin James.\n",
    "penguins.append(('James', 12, 90, 12))\n",
    "# Find penguins under 8 years old.\n",
    "penguins_under_8_years_old = [penguin for penguin in penguins if penguin[1] < 8]\n",
    "# Count number of penguins under 8.\n",
    "num_penguin_under_8 = len(penguins_under_8_years_old)\n",
    "answer = num_penguin_under_8\n",
    "\"\"\"\n",
    "Q: Here is a table where the first line is a header and each subsequent line is a penguin:\n",
    "name, age, height (cm), weight (kg) \n",
    "Louis, 7, 50, 11\n",
    "Bernard, 5, 80, 13\n",
    "Vincent, 9, 60, 11\n",
    "Gwen, 8, 70, 15\n",
    "For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.\n",
    "Which is the youngest penguin?\n",
    "\"\"\"\n",
    "# Put the penguins into a list.\n",
    "penguins = []\n",
    "penguins.append(('Louis', 7, 50, 11))\n",
    "penguins.append(('Bernard', 5, 80, 13))\n",
    "penguins.append(('Vincent', 9, 60, 11))\n",
    "penguins.append(('Gwen', 8, 70, 15))\n",
    "# Sort the penguins by age.\n",
    "penguins = sorted(penguins, key=lambda x: x[1])\n",
    "# Get the youngest penguin's name.\n",
    "youngest_penguin_name = penguins[0][0]\n",
    "answer = youngest_penguin_name\n",
    "\"\"\"\n",
    "Q: Here is a table where the first line is a header and each subsequent line is a penguin:\n",
    "name, age, height (cm), weight (kg) \n",
    "Louis, 7, 50, 11\n",
    "Bernard, 5, 80, 13\n",
    "Vincent, 9, 60, 11\n",
    "Gwen, 8, 70, 15\n",
    "For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.\n",
    "What is the name of the second penguin sorted by alphabetic order?\n",
    "\"\"\"\n",
    "# Put the penguins into a list.\n",
    "penguins = []\n",
    "penguins.append(('Louis', 7, 50, 11))\n",
    "penguins.append(('Bernard', 5, 80, 13))\n",
    "penguins.append(('Vincent', 9, 60, 11))\n",
    "penguins.append(('Gwen', 8, 70, 15))\n",
    "# Sort penguins by alphabetic order.\n",
    "penguins_alphabetic = sorted(penguins, key=lambda x: x[0])\n",
    "# Get the second penguin sorted by alphabetic order.\n",
    "second_penguin_name = penguins_alphabetic[1][0]\n",
    "answer = second_penguin_name\n",
    "\"\"\"\n",
    "{question}\n",
    "\"\"\"\n",
    "'''.strip() + '\\n'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "既然我们已经有了提示和问题，我们可以将其发送给模型。它应该输出所需的代码步骤，以获得答案的解决方案。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Put the penguins into a list.\n",
      "penguins = []\n",
      "penguins.append(('Louis', 7, 50, 11))\n",
      "penguins.append(('Bernard', 5, 80, 13))\n",
      "penguins.append(('Vincent', 9, 60, 11))\n",
      "penguins.append(('Gwen', 8, 70, 15))\n",
      "# Sort the penguins by age.\n",
      "penguins = sorted(penguins, key=lambda x: x[1], reverse=True)\n",
      "# Get the oldest penguin's name.\n",
      "oldest_penguin_name = penguins[0][0]\n",
      "answer = oldest_penguin_name\n"
     ]
    }
   ],
   "source": [
    "llm_out = llm(PENGUIN_PROMPT.format(question=question))\n",
    "print(llm_out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Vincent\n"
     ]
    }
   ],
   "source": [
    "exec(llm_out)\n",
    "print(answer)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "答對了！Vincent是最老的企鵝。 "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Try a different question and see what's the result."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Tools and Applications\n",
    "\n",
    "Objective:\n",
    "\n",
    "- Demonstrate how to use LangChain to demonstrate simple applications using prompting techniques and LLMs"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 LLMs & External Tools\n",
    "\n",
    "Example adopted from the [LangChain documentation](https://langchain.readthedocs.io/en/latest/modules/agents/getting_started.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.agents import load_tools\n",
    "from langchain.agents import initialize_agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(temperature=0)\n",
    "\n",
    "tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
    "agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
      "Action: Search\n",
      "Action Input: \"Olivia Wilde boyfriend\"\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mJason Sudeikis\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' age\n",
      "Action: Search\n",
      "Action Input: \"Jason Sudeikis age\"\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3m47 years\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I need to calculate 47 raised to the 0.23 power\n",
      "Action: Calculator\n",
      "Action Input: 47^0.23\u001b[0m\n",
      "Observation: \u001b[33;1m\u001b[1;3mAnswer: 2.4242784855673896\n",
      "\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\"Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\""
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# run the agent\n",
    "agent.run(\"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Data-Augmented Generation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need to download the data we want to use as source to augment generation.\n",
    "\n",
    "Code example adopted from [LangChain Documentation](https://langchain.readthedocs.io/en/latest/modules/chains/combine_docs_examples/qa_with_sources.html). We are only using the examples for educational purposes."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Prepare the data first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.embeddings.cohere import CohereEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.docstore.document import Document\n",
    "from langchain.prompts import PromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('./state_of_the_union.txt') as f:\n",
    "    state_of_the_union = f.read()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "texts = text_splitter.split_text(state_of_the_union)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running Chroma using direct local API.\n",
      "Using DuckDB in-memory for database. Data will be transient.\n"
     ]
    }
   ],
   "source": [
    "docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{\"source\": str(i)} for i in range(len(texts))])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"What did the president say about Justice Breyer\"\n",
    "docs = docsearch.similarity_search(query)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's quickly test it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
    "from langchain.llms import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'output_text': ' The president thanked Justice Breyer for his service.\\nSOURCES: 30-pl'}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"stuff\")\n",
    "query = \"What did the president say about Justice Breyer\"\n",
    "chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try a question with a custom prompt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'PromptTemplate' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[1], line 13\u001b[0m\n\u001b[1;32m      1\u001b[0m template \u001b[39m=\u001b[39m \u001b[39m\"\"\"\u001b[39m\u001b[39mGiven the following extracted parts of a long document and a question, create a final answer with references (\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mSOURCES\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m). \u001b[39m\n\u001b[1;32m      2\u001b[0m \u001b[39mIf you don\u001b[39m\u001b[39m'\u001b[39m\u001b[39mt know the answer, just say that you don\u001b[39m\u001b[39m'\u001b[39m\u001b[39mt know. Don\u001b[39m\u001b[39m'\u001b[39m\u001b[39mt try to make up an answer.\u001b[39m\n\u001b[1;32m      3\u001b[0m \u001b[39mALWAYS return a \u001b[39m\u001b[39m\"\u001b[39m\u001b[39mSOURCES\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m part in your answer.\u001b[39m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[39m=========\u001b[39m\n\u001b[1;32m     10\u001b[0m \u001b[39mFINAL ANSWER IN SPANISH:\u001b[39m\u001b[39m\"\"\"\u001b[39m\n\u001b[1;32m     12\u001b[0m \u001b[39m# create a prompt template\u001b[39;00m\n\u001b[0;32m---> 13\u001b[0m PROMPT \u001b[39m=\u001b[39m PromptTemplate(template\u001b[39m=\u001b[39mtemplate, input_variables\u001b[39m=\u001b[39m[\u001b[39m\"\u001b[39m\u001b[39msummaries\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mquestion\u001b[39m\u001b[39m\"\u001b[39m])\n\u001b[1;32m     15\u001b[0m \u001b[39m# query \u001b[39;00m\n\u001b[1;32m     16\u001b[0m chain \u001b[39m=\u001b[39m load_qa_with_sources_chain(OpenAI(temperature\u001b[39m=\u001b[39m\u001b[39m0\u001b[39m), chain_type\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mstuff\u001b[39m\u001b[39m\"\u001b[39m, prompt\u001b[39m=\u001b[39mPROMPT)\n",
      "\u001b[0;31mNameError\u001b[0m: name 'PromptTemplate' is not defined"
     ]
    }
   ],
   "source": [
    "template = \"\"\"Given the following extracted parts of a long document and a question, create a final answer with references (\"SOURCES\"). \n",
    "If you don't know the answer, just say that you don't know. Don't try to make up an answer.\n",
    "ALWAYS return a \"SOURCES\" part in your answer.\n",
    "Respond in Spanish.\n",
    "\n",
    "QUESTION: {question}\n",
    "=========\n",
    "{summaries}\n",
    "=========\n",
    "FINAL ANSWER IN SPANISH:\"\"\"\n",
    "\n",
    "# create a prompt template\n",
    "PROMPT = PromptTemplate(template=template, input_variables=[\"summaries\", \"question\"])\n",
    "\n",
    "# query \n",
    "chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"stuff\", prompt=PROMPT)\n",
    "query = \"What did the president say about Justice Breyer?\"\n",
    "chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Try using a different dataset from the internet and try different prompt, including all the techniques you learned in the lecture."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "promptlecture",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "f38e0373277d6f71ee44ee8fea5f1d408ad6999fda15d538a69a99a1665a839d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
